This is a starter RMarkdown template to accompany Data Visualization (Princeton University Press, 2019). You can use it to take notes, write your code, and produce a good-looking, reproducible document that records the work you have done. At the very top of the file is a section of metadata, or information about what the file is and what it does. The metadata is delimited by three dashes at the start and another three at the end. You should change the title, author, and date to the values that suit you. Keep the output line as it is for now, however. Each line in the metadata has a structure. First the key (“title”, “author”, etc), then a colon, and then the value associated with the key.
Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. A code chunk is a specially delimited section of the file. You can add one by moving the cursor to a blank line choosing Code > Insert Chunk from the RStudio menu. When you do, an empty chunk will appear:
Code chunks are delimited by three backticks (found to the left of the 1 key on US and UK keyboards) at the start and end. The opening backticks also have a pair of braces and the letter r, to indicate what language the chunk is written in. You write your code inside the code chunks. Write your notes and other material around them, as here.
To install the tidyverse, make sure you have an Internet connection. Then manually run the code in the chunk below. If you knit the document if will be skipped. We do this because you only need to install these packages once, not every time you run this file. Either knit the chunk using the little green “play” arrow to the right of the chunk area, or copy and paste the text into the console window.
## This code will not be evaluated automatically.
## (Notice the eval = FALSE declaration in the options section of the
## code chunk)
my_packages <- c("tidyverse", "broom", "coefplot", "cowplot",
"gapminder", "GGally", "ggrepel", "ggridges", "gridExtra",
"here", "interplot", "margins", "maps", "mapproj",
"mapdata", "MASS", "quantreg", "rlang", "scales",
"survey", "srvyr", "viridis", "viridisLite", "devtools", "tibble")
install.packages(my_packages, repos = "http://cran.rstudio.com")
To begin we must load some libraries we will be using. If we do not load them, R will not be able to find the functions contained in these libraries. The tidyverse includes ggplot and other tools. We also load the socviz and gapminder libraries.
Notice that here, the braces at the start of the code chunk have some additional options set in them. There is the language, r, as before. This is required. Then there is the word setup, which is a label for your code chunk. Labels are useful to briefly say what the chunk does. Label names must be unique (no two chunks in the same document can have the same label) and cannot contain spaces. Then, after the comma, an option is set: include=FALSE. This tells R to run this code but not to include the output in the final document.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
gapminder
## # A tibble: 1,704 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## # ... with 1,694 more rows
The remainder of this document contains the chapter headings for the book, and an empty code chunk in each section to get you started. Try knitting this document now by clicking the “Knit” button in the RStudio toolbar, or choosing File > Knit Document from the RStudio menu.
library(tidyverse)
library(socviz)
url <- "https://cdn.rawgit.com/kjhealy/viz-organdata/master/organdonation.csv"
organs <- read_csv(file = url)
## Parsed with column specification:
## cols(
## .default = col_double(),
## country = col_character(),
## world = col_character(),
## opt = col_character(),
## consent.law = col_character(),
## consent.practice = col_character(),
## consistent = col_character(),
## ccode = col_character()
## )
## See spec(...) for full column specifications.
library(gapminder)
gapminder
## # A tibble: 1,704 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## # ... with 1,694 more rows
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point()
library(gapminder)
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_smooth(method ="loess")+ geom_point(alpha = 0.3, mapping = aes(color = factor(year))) + scale_x_log10(labels = scales::dollar) + labs(x = "GDP Per Capita", y = "Life Expectancy in Years", title = "Economic Growth and Life Expectancy", subtitle = "Data points are country-years", caption = "Source: Gapminder.")
#ggsave(here("figures", "lifexp_vs_gdp_gradient.pdf"))
p <- ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap))
p + geom_line(aes(color = "gray70", group = country)) + geom_smooth(size = 1.1, method = "loess", se = FALSE) + scale_y_log10(labels = scales::dollar) + facet_wrap(~continent, ncol = 5) + labs(x = "Year", y = "GDP per capita", title = "GDP per capita on Five Continents")
p <- ggplot(data = gss_sm, mapping = aes( x = age, y = childs))
p + geom_point(alpha = 0.2) + geom_smooth() + facet_grid(sex ~ race)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 18 rows containing non-finite values (stat_smooth).
## Warning: Removed 18 rows containing missing values (geom_point).
p <- ggplot(data = gss_sm, mapping = aes(x = bigregion))
p + geom_bar(mapping = aes(y = ..prop.., group = 1))
p <- ggplot(data = gss_sm, mapping = aes(x = religion, fill = religion))
p + geom_bar() + guides(fill = FALSE)
p <- ggplot(data = gss_sm, mapping = aes(x = religion))
p + geom_bar(position = "dodge", mapping = aes(y = ..prop.., group = bigregion))+
facet_wrap(~bigregion, ncol = 2)
p <- ggplot(data = midwest, mapping = aes(x = area))
p + geom_histogram(bins = 10)
oh_wi <- c("OH", "WI")
p <- ggplot(data = subset(midwest, subset = state %in% oh_wi), mapping = aes(x = percollege, fill = state))
p + geom_histogram(alpha = 0.4, bins = 20)
p <- ggplot(data = midwest, mapping = aes(x = area))
p + geom_density()
p <- ggplot(data = midwest, mapping = aes(x = area, fill = state, color = state))
p + geom_density(alpha = 0.3)
p <- ggplot(data = subset(midwest,subset = state %in% oh_wi), mapping = aes(x = area, fill = state, color = state))
p + geom_density(alpha = 0.3, mapping = (aes(y = ..scaled..)))
p <- ggplot(data = titanic, mapping = aes(x = fate, y = percent, fill = sex))
p + geom_bar(position = "dodge", stat = "identity") + theme(legend.position ="top")
p <- ggplot(data = oecd_sum, mapping = aes(x = year, y = diff, fill = hi_lo))
p + geom_col() + guides(fill = FALSE) + labs(x = NULL, y = "Difference in Years", title = "The US Life Expectancy Gap", subtitle = "Difference between US and OECD average life expenctancies, 1960-2015", caption = ("Data: OECD, After a chart by Christopher Ingraham, Washington Post, December 27th, 2017."))
## Warning: Removed 1 rows containing missing values (position_stack).
p <- ggplot(data = midwest, mapping = aes(x = percbelowpoverty, y = percollege))
p + geom_density_2d() + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
rel_by_region <- gss_sm %>%
group_by(bigregion, religion) %>%
summarize(N = n()) %>%
mutate (freq = N / sum(N), pct = round((freq*100),0))
## Warning: Factor `religion` contains implicit NA, consider using
## `forcats::fct_explicit_na`
p <- ggplot(rel_by_region, aes(x = bigregion, y = pct, fill = religion))
p + geom_col(position = "dodge2") + labs(x = "Region", y = "Percent", fill = "Religion") + theme(legend.position = "top")
p <- ggplot(rel_by_region, aes(x = religion, y = pct, fill = religion))
p + geom_col(position = "dodge2") + labs(x = NULL, y = "Percent", fill = "Religion") + guides(fill = FALSE) + coord_flip() + facet_grid(~bigregion)
p <- ggplot(data = organdata, mapping = aes(x = year, y = donors))
p + geom_line(aes(group = country)) + facet_wrap(~country)
## Warning: Removed 34 rows containing missing values (geom_path).
p <- ggplot(data=organdata, mapping = aes(x = reorder(country, donors, na.rm = TRUE), y = donors, fill = world))
p + geom_boxplot() + labs(x = NULL) + coord_flip() + theme(legend.position = "top")
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).
p <- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, na.rm = TRUE), y = donors, color = world))
p + geom_jitter(position = position_jitter(width = 0.15)) + labs(x = NULL) + coord_flip() + theme(legend.position = "top")
## Warning: Removed 34 rows containing missing values (geom_point).
by_country <- organdata %>% group_by(consent_law, country) %>% summarize(donors_mean = mean(donors, na.rm = TRUE), donors_sd = sd(donors, na.rm = TRUE), gdp_mean = mean( gdp, na.rm = TRUE), health_mean = mean(health, na.rm = TRUE), roads_mean = mean(roads, na.rm = TRUE), cerebvas_mean = mean(cerebvas, na.rm = TRUE))
by_country <- organdata %>% group_by(consent_law, country) %>% summarize_if(is.numeric, funs(mean, sd), na.rm =
TRUE) %>% ungroup()
## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas:
##
## # Simple named list:
## list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`:
## tibble::lst(mean, median)
##
## # Using lambdas
## list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.
p <- ggplot(data = by_country, mapping = aes(x = donors_mean, y = reorder(country, donors_mean), color = consent_law))
p + geom_point(size = 3) + labs(x = "Donor Procurement Rate", y = "", color = "Consent Law") + theme(legend.position = "top")
p <- ggplot(data = by_country, mapping = aes(x = donors_mean, y = reorder(country, donors_mean)))
p + geom_point(size = 3) + facet_wrap(~consent_law, scales = "free_y", ncol = 1) + labs(x = "Donor Procurement Rate", y = "")
p <- ggplot(data = by_country, mapping = aes(x = reorder(country, donors_mean), y = donors_mean))
p + geom_pointrange(mapping = aes(ymin = donors_mean - donors_sd, ymax = donors_mean + donors_sd)) + labs(x = "Donor Procurement Rate", y = "") + coord_flip()
p <- ggplot(data = by_country, mapping = aes(x = roads_mean, y = donors_mean))
p + geom_point() + geom_text(mapping = aes(label = country),hjust = 0)
library(ggrepel)
p_title <- "Presidential Elections: Popular & Electoral College Margins"
p_subtitle <- "1824-2016"
p_caption <- "Data for 2016 are provisional"
x_label <- "Winner's share of Popular Vote"
y_label <- "Winner's share of Electoral College Vote"
p <- ggplot(elections_historic, aes(x = popular_pct, y = ec_pct, label = winner_label, color = win_party))
p + geom_hline(yintercept = 0.5, size = 1.4, color = "gray80") +
geom_vline(xintercept = 0.5, size = 1.4, color = "gray80") +
geom_point() +
geom_text_repel() +
scale_x_continuous(labels = scales::percent) +
scale_y_continuous(labels = scales::percent) +
labs(x = x_label, y = y_label, title = p_title, subtitle = p_subtitle, caption = p_caption)
p <- ggplot(data = by_country, mapping = aes(x = gdp_mean, y = health_mean))
p + geom_point() + geom_text_repel(data = subset(by_country, gdp_mean > 25000), mapping = aes(label = country))
p <- ggplot(data = by_country, mapping = aes(x = gdp_mean, y = health_mean))
p + geom_point() + geom_text_repel(data = subset(by_country, gdp_mean > 25000 | health_mean < 1500), mapping = aes(label = country))
organdata$ind <- organdata$ccode %in% c("Ita", "Spa") & organdata$year > 1998
p <- ggplot(data = organdata, mapping = aes(x = roads, y = donors, color = ind))
p + geom_point() + geom_text_repel(data = subset(organdata, ind), mapping = aes(label = ccode)) + guides(label = FALSE, color = FALSE)
## Warning: Removed 34 rows containing missing values (geom_point).
p <- ggplot2::ggplot(data = organdata, mapping = aes(x = roads, y = donors))
p + geom_point() + annotate(geom = "text", x = 91, y = 35, label = "A surprisingly high \n recovery rate.", hjust = 0)
## Warning: Removed 34 rows containing missing values (geom_point).
p <- ggplot2::ggplot(data = organdata, mapping = aes(x = roads, y = donors))
p + geom_point() + annotate(geom = "rect", xmin = 125, xmax = 155, ymin = 30, ymax = 35, fill = "red", alpha = 0.2) + annotate(geom = "text", x = 157, y = 33, label = "A surprisingly high \n recovery rate.", hjust = 0 )
## Warning: Removed 34 rows containing missing values (geom_point).
p <- ggplot2::ggplot(data = organdata, mapping = aes(x = roads, y = donors, color = world))
p + geom_point() + scale_x_log10() + scale_y_continuous(breaks = c(5, 15, 25), labels = c("Five", "Fifteen", "Twenty Five"))
## Warning: Removed 34 rows containing missing values (geom_point).
p <- ggplot2::ggplot(data = organdata, mapping = aes(x = roads, y = donors, color = world))
p + geom_point() + scale_color_discrete(labels = c("Corporatist", "Liberal", "Social Democratic", "Unclassified")) + labs(x = "Road Deaths", y = "Donor Procurement", color = "Welfare State") + theme(legend.position = "top")
## Warning: Removed 34 rows containing missing values (geom_point).
## Work with Models
p <- ggplot(data = gapminder, mapping = aes(x = log(gdpPercap), y = lifeExp))
p + geom_point(alpha = 0.1) + geom_smooth(color = "tomato", fill = "tomato", method = MASS::rlm) + geom_smooth(color = "steelblue", fill = "steelblue", method = "lm")
p + geom_point(alpha=0.1) + geom_smooth(color = "tomato", method = "lm", size = 1.2, formula = y ~ splines::bs(x, 3), se = FALSE)
p + geom_point(alpha=0.1) + geom_quantile(color = "tomato", size = 1.2, method = "rqss", lambda = 1, quantiles = c(0.20, 0.5, 0.85))
## Smoothing formula not specified. Using: y ~ qss(x, lambda = 1)
model_colors <- RColorBrewer::brewer.pal(3, "Set1")
model_colors
## [1] "#E41A1C" "#377EB8" "#4DAF4A"
p0 <- ggplot(data = gapminder, mapping = aes(x = log(gdpPercap), y = lifeExp))
p1 <- p0 + geom_point(alpha = 0.2) + geom_smooth(method = "lm", aes(color = "OLS", fill = "OLS")) + geom_smooth(method = "lm", formula = y ~ splines::bs(x, df = 3), aes(color = "Cubic Spline", fill = "Cubic Spline")) + geom_smooth(method = "loess", aes(color = "LOESS", fill = "LOESS"))
p1 + scale_color_manual(name = "Models", values = model_colors) + scale_fill_manual(name = "Models", values = model_colors) + theme(legend.position = "top")
out <- lm(formula = lifeExp ~ gdpPercap + pop + continent, data = gapminder)
summary(out)
##
## Call:
## lm(formula = lifeExp ~ gdpPercap + pop + continent, data = gapminder)
##
## Residuals:
## Min 1Q Median 3Q Max
## -49.161 -4.486 0.297 5.110 25.175
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.781e+01 3.395e-01 140.819 < 2e-16 ***
## gdpPercap 4.495e-04 2.346e-05 19.158 < 2e-16 ***
## pop 6.570e-09 1.975e-09 3.326 0.000901 ***
## continentAmericas 1.348e+01 6.000e-01 22.458 < 2e-16 ***
## continentAsia 8.193e+00 5.712e-01 14.342 < 2e-16 ***
## continentEurope 1.747e+01 6.246e-01 27.973 < 2e-16 ***
## continentOceania 1.808e+01 1.782e+00 10.146 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.365 on 1697 degrees of freedom
## Multiple R-squared: 0.5821, Adjusted R-squared: 0.5806
## F-statistic: 393.9 on 6 and 1697 DF, p-value: < 2.2e-16
min_gdp <- min(gapminder$gdpPercap)
max_gdp <- max(gapminder$gdpPercap)
med_pop <- median(gapminder$pop)
pred_df <- expand.grid(gdpPercap = (seq(from = min_gdp, to = max_gdp, length.out = 100)), pop = med_pop, continent = c("Africa", "Americas", "Asia", "Europe", "Oceania"))
dim(pred_df)
## [1] 500 3
pred_out <- predict(object = out, newdata = pred_df, interval = "predict")
head(pred_out)
## fit lwr upr
## 1 47.96863 31.54775 64.38951
## 2 48.48298 32.06231 64.90365
## 3 48.99733 32.57670 65.41797
## 4 49.51169 33.09092 65.93245
## 5 50.02604 33.60497 66.44711
## 6 50.54039 34.11885 66.96193
pred_df <- cbind(pred_df, pred_out)
head(pred_df)
## gdpPercap pop continent fit lwr upr
## 1 241.1659 7023596 Africa 47.96863 31.54775 64.38951
## 2 1385.4282 7023596 Africa 48.48298 32.06231 64.90365
## 3 2529.6905 7023596 Africa 48.99733 32.57670 65.41797
## 4 3673.9528 7023596 Africa 49.51169 33.09092 65.93245
## 5 4818.2150 7023596 Africa 50.02604 33.60497 66.44711
## 6 5962.4773 7023596 Africa 50.54039 34.11885 66.96193
p <- ggplot(data = subset(pred_df, continent %in% c("Europe", "Africa")), aes(x = gdpPercap, y = fit, ymin = lwr, ymax = upr, color = continent, fill = continent, group = continent))
p + geom_point(data = subset(gapminder, continent %in% c("Europe","Africa")), aes(x = gdpPercap, y = lifeExp, color = continent), alpha = 0.5, inherit.aes = FALSE) + geom_line() + geom_ribbon(alpha = 0.2, color = FALSE) + scale_x_log10(labels = scales::dollar) + theme(legend.position = "top")
library(broom)
out_comp <- tidy(out)
out_comp %>% round_df()
## # A tibble: 7 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 47.8 0.34 141. 0
## 2 gdpPercap 0 0 19.2 0
## 3 pop 0 0 3.33 0
## 4 continentAmericas 13.5 0.6 22.5 0
## 5 continentAsia 8.19 0.570 14.3 0
## 6 continentEurope 17.5 0.62 28.0 0
## 7 continentOceania 18.1 1.78 10.2 0
p <- ggplot(out_comp, mapping = aes(x = term, y = estimate))
p + geom_point()+ coord_flip()
out_conf <- tidy(out, conf.int = TRUE)
out_conf %>% round_df()
## # A tibble: 7 x 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 47.8 0.34 141. 0 47.2 48.5
## 2 gdpPercap 0 0 19.2 0 0 0
## 3 pop 0 0 3.33 0 0 0
## 4 continentAmericas 13.5 0.6 22.5 0 12.3 14.6
## 5 continentAsia 8.19 0.570 14.3 0 7.07 9.31
## 6 continentEurope 17.5 0.62 28.0 0 16.2 18.7
## 7 continentOceania 18.1 1.78 10.2 0 14.6 21.6
out_conf <- subset(out_conf, term %nin% "(Intercept)")
out_conf$nicelabs <- prefix_strip(out_conf$term, "continent")
p <- ggplot(out_conf, mapping = aes(x = reorder(nicelabs, estimate),y = estimate, ymin = conf.low, ymax = conf.high))
p + geom_pointrange() + coord_flip() + labs(x = "", y = "OLS Estimate")
out_aug <- augment(out, data = gapminder)
head(out_aug) %>% round_df()
## # A tibble: 6 x 13
## country continent year lifeExp pop gdpPercap .fitted .se.fit .resid .hat
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghan~ Asia 1952 28.8 8.43e6 779. 56.4 0.47 -27.6 0
## 2 Afghan~ Asia 1957 30.3 9.24e6 821. 56.4 0.47 -26.1 0
## 3 Afghan~ Asia 1962 32 1.03e7 853. 56.5 0.47 -24.5 0
## 4 Afghan~ Asia 1967 34.0 1.15e7 836. 56.5 0.47 -22.4 0
## 5 Afghan~ Asia 1972 36.1 1.31e7 740. 56.4 0.47 -20.3 0
## 6 Afghan~ Asia 1977 38.4 1.49e7 786. 56.5 0.47 -18.0 0
## # ... with 3 more variables: .sigma <dbl>, .cooksd <dbl>, .std.resid <dbl>
p <- ggplot(data = out_aug, mapping = aes(x = .fitted, y = .resid))
p + geom_point()
glance(out) %>% round_df()
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.580 0.580 8.37 394. 0 7 -6034. 12084. 12127.
## # ... with 2 more variables: deviance <dbl>, df.residual <dbl>
library(survival)
out_cph <- coxph(Surv(time, status) ~ age + sex, data = lung)
out_surv <- survfit(out_cph)
out_tidy <- tidy(out_surv)
p <- ggplot(data = out_tidy, mapping = aes(time, estimate))
p + geom_line() + geom_ribbon(mapping = aes(ymin = conf.low, ymax = conf.high), alpha = 0.2)
eu77 <- gapminder %>% filter(continent == "Europe", year == 1977)
fit <- lm(lifeExp ~ log(gdpPercap), data = eu77)
summary(fit)
##
## Call:
## lm(formula = lifeExp ~ log(gdpPercap), data = eu77)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.4956 -1.0306 0.0935 1.1755 3.7125
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.489 7.161 4.118 0.000306 ***
## log(gdpPercap) 4.488 0.756 5.936 2.17e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.114 on 28 degrees of freedom
## Multiple R-squared: 0.5572, Adjusted R-squared: 0.5414
## F-statistic: 35.24 on 1 and 28 DF, p-value: 2.173e-06
out_le <- gapminder %>% group_by(continent, year) %>% nest()
out_le
## # A tibble: 60 x 3
## # Groups: continent, year [60]
## continent year data
## <fct> <int> <list<df[,4]>>
## 1 Asia 1952 [33 x 4]
## 2 Asia 1957 [33 x 4]
## 3 Asia 1962 [33 x 4]
## 4 Asia 1967 [33 x 4]
## 5 Asia 1972 [33 x 4]
## 6 Asia 1977 [33 x 4]
## 7 Asia 1982 [33 x 4]
## 8 Asia 1987 [33 x 4]
## 9 Asia 1992 [33 x 4]
## 10 Asia 1997 [33 x 4]
## # ... with 50 more rows
out_le %>% filter(continent == "Europe" & year == 1977) %>% unnest()
## # A tibble: 30 x 6
## # Groups: continent, year [5]
## continent year country lifeExp pop gdpPercap
## <fct> <int> <fct> <dbl> <int> <dbl>
## 1 Europe 1977 Albania 68.9 2509048 3533.
## 2 Europe 1977 Austria 72.2 7568430 19749.
## 3 Europe 1977 Belgium 72.8 9821800 19118.
## 4 Europe 1977 Bosnia and Herzegovina 69.9 4086000 3528.
## 5 Europe 1977 Bulgaria 70.8 8797022 7612.
## 6 Europe 1977 Croatia 70.6 4318673 11305.
## 7 Europe 1977 Czech Republic 70.7 10161915 14800.
## 8 Europe 1977 Denmark 74.7 5088419 20423.
## 9 Europe 1977 Finland 72.5 4738902 15605.
## 10 Europe 1977 France 73.8 53165019 18293.
## # ... with 20 more rows
fit_ols <- function(df) {
lm(lifeExp ~ log(gdpPercap), data = df)
}
out_tidy <- gapminder %>% group_by(continent, year) %>% nest() %>% mutate(model = map(data, fit_ols), tidied = map(model, tidy)) %>% unnest(tidied, .drop = TRUE) %>% filter(term %nin% "(Intercept)" & continent %nin% "Oceania")
# out_tidy %>% sample_n(5)
p <- ggplot(data = out_tidy, mapping = aes(x = year, y = estimate, ymin = estimate - 2*std.error, ymax = estimate + 2*std.error, group = continent, color = continent))
p + geom_pointrange(position = position_dodge(width = 1)) + scale_x_continuous(breaks = unique(gapminder$year)) + theme(legend.position = "top") + labs(x = "Year", y = "Estimate", color = "Continent")
library(margins)
gss_sm$polviews_m <- relevel(gss_sm$polviews, ref = "Moderate")
out_bo <- glm(obama ~ polviews_m + sex*race, family = "binomial", data = gss_sm)
summary(out_bo)
##
## Call:
## glm(formula = obama ~ polviews_m + sex * race, family = "binomial",
## data = gss_sm)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.9045 -0.5541 0.1772 0.5418 2.2437
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.296493 0.134091 2.211 0.02703 *
## polviews_mExtremely Liberal 2.372950 0.525045 4.520 6.20e-06 ***
## polviews_mLiberal 2.600031 0.356666 7.290 3.10e-13 ***
## polviews_mSlightly Liberal 1.293172 0.248435 5.205 1.94e-07 ***
## polviews_mSlightly Conservative -1.355277 0.181291 -7.476 7.68e-14 ***
## polviews_mConservative -2.347463 0.200384 -11.715 < 2e-16 ***
## polviews_mExtremely Conservative -2.727384 0.387210 -7.044 1.87e-12 ***
## sexFemale 0.254866 0.145370 1.753 0.07956 .
## raceBlack 3.849526 0.501319 7.679 1.61e-14 ***
## raceOther -0.002143 0.435763 -0.005 0.99608
## sexFemale:raceBlack -0.197506 0.660066 -0.299 0.76477
## sexFemale:raceOther 1.574829 0.587657 2.680 0.00737 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2247.9 on 1697 degrees of freedom
## Residual deviance: 1345.9 on 1686 degrees of freedom
## (1169 observations deleted due to missingness)
## AIC: 1369.9
##
## Number of Fisher Scoring iterations: 6
bo_m <- margins(out_bo)
summary(bo_m)
## factor AME SE z p lower
## polviews_mConservative -0.4119 0.0283 -14.5394 0.0000 -0.4674
## polviews_mExtremely Conservative -0.4538 0.0420 -10.7971 0.0000 -0.5361
## polviews_mExtremely Liberal 0.2681 0.0295 9.0996 0.0000 0.2103
## polviews_mLiberal 0.2768 0.0229 12.0736 0.0000 0.2319
## polviews_mSlightly Conservative -0.2658 0.0330 -8.0596 0.0000 -0.3304
## polviews_mSlightly Liberal 0.1933 0.0303 6.3896 0.0000 0.1340
## raceBlack 0.4032 0.0173 23.3568 0.0000 0.3694
## raceOther 0.1247 0.0386 3.2297 0.0012 0.0490
## sexFemale 0.0443 0.0177 2.5073 0.0122 0.0097
## upper
## -0.3564
## -0.3714
## 0.3258
## 0.3218
## -0.2011
## 0.2526
## 0.4371
## 0.2005
## 0.0789
bo_gg <- as_tibble(summary(bo_m))
prefixes <- c("polviews_m", "sex")
bo_gg$factor <- prefix_strip(bo_gg$factor, prefixes)
bo_gg$factor <- prefix_replace(bo_gg$factor, "race", "Race: ")
bo_gg %>% select(factor, AME, lower, upper)
## # A tibble: 9 x 4
## factor AME lower upper
## <chr> <dbl> <dbl> <dbl>
## 1 Conservative -0.412 -0.467 -0.356
## 2 Extremely Conservative -0.454 -0.536 -0.371
## 3 Extremely Liberal 0.268 0.210 0.326
## 4 Liberal 0.277 0.232 0.322
## 5 Slightly Conservative -0.266 -0.330 -0.201
## 6 Slightly Liberal 0.193 0.134 0.253
## 7 Race: Black 0.403 0.369 0.437
## 8 Race: Other 0.125 0.0490 0.200
## 9 Female 0.0443 0.00967 0.0789
p <- ggplot(data = bo_gg, aes(x = reorder(factor, AME), y = AME, ymin = lower, ymax = upper))
p + geom_hline(yintercept = 0, color = "gray80") + geom_pointrange() + coord_flip() + labs(x = NULL, y = "Average Marginal Effect")
pv_cp <- cplot(out_bo, x = "sex", draw = FALSE)
## xvals yvals upper lower
## 1 Male 0.5735849 0.6378653 0.5093045
## 2 Female 0.6344507 0.6887845 0.5801169
p <- ggplot(data = pv_cp,aes(x = reorder(xvals, yvals),y = yvals, ymin = lower, ymax = upper))
p + geom_hline(yintercept = 0, color = "gray80") + geom_pointrange() + coord_flip() + labs(x = NULL, y = "Conditional Effect")
library(survey)
## Loading required package: grid
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
##
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
##
## dotchart
library(srvyr)
##
## Attaching package: 'srvyr'
## The following object is masked from 'package:stats':
##
## filter
options(survey.lonely.psu = "adjust")
options(na.action = "na.pass")
gss_wt <- subset(gss_lon, year > 1974) %>% mutate(stratvar = interaction(year, vstrat)) %>% as_survey_design(ids = vpsu, strata = stratvar, weights = wtssall, nest = TRUE)
out_grp <- gss_wt %>% filter(year %in% seq(1976, 2016, by = 4)) %>% group_by(year, race, degree) %>% summarize(prop = survey_mean(na.rm = TRUE))
out_grp
## # A tibble: 162 x 5
## year race degree prop prop_se
## <dbl> <fct> <fct> <dbl> <dbl>
## 1 1976 White Lt High School 0.328 0.0160
## 2 1976 White High School 0.518 0.0162
## 3 1976 White Junior College 0.0129 0.00298
## 4 1976 White Bachelor 0.101 0.00960
## 5 1976 White Graduate 0.0393 0.00644
## 6 1976 White <NA> NA NA
## 7 1976 Black Lt High School 0.562 0.0611
## 8 1976 Black High School 0.337 0.0476
## 9 1976 Black Junior College 0.0426 0.0193
## 10 1976 Black Bachelor 0.0581 0.0239
## # ... with 152 more rows
out_mrg <- gss_wt %>% filter(year %in% seq(1976, 2016, by = 4)) %>% mutate(racedeg = interaction(race, degree)) %>% group_by(year, racedeg) %>% summarize(prop = survey_mean(na.rm = TRUE))%>% separate(racedeg, sep = "\\.", into = c("race","degree"))
out_mrg
## # A tibble: 155 x 5
## year race degree prop prop_se
## <dbl> <chr> <chr> <dbl> <dbl>
## 1 1976 White Lt High School 0.298 0.0146
## 2 1976 Black Lt High School 0.0471 0.00840
## 3 1976 Other Lt High School 0.00195 0.00138
## 4 1976 White High School 0.471 0.0160
## 5 1976 Black High School 0.0283 0.00594
## 6 1976 Other High School 0.00325 0.00166
## 7 1976 White Junior College 0.0117 0.00268
## 8 1976 Black Junior College 0.00357 0.00162
## 9 1976 White Bachelor 0.0919 0.00888
## 10 1976 Black Bachelor 0.00487 0.00214
## # ... with 145 more rows
p <- ggplot(data = subset(out_grp, race %nin% "Other"), mapping = aes(x = degree, y = prop, ymin = prop - 2*prop_se, ymax = prop + 2*prop_se, fill = race, color = race, group = race))
dodge <- position_dodge(width = 0.9)
p + geom_col(position = dodge, alpha = 0.2) + geom_errorbar(position = dodge, width = 0.2) + scale_x_discrete(labels = scales::wrap_format(10)) + scale_y_continuous(labels = scales::percent) +
scale_color_brewer(type = "qual", palette = "Dark2") + scale_fill_brewer(type = "qual", palette = "Dark2") + labs(title = "Educational Attainment by Race", subtitle = "GSS 1976-2016", fill = "Race", color = "Race", x = NULL, y = "Percent") + facet_wrap(~year, ncol = 2) + theme(legend.position = "top")
p <- ggplot(data = subset(out_grp, race %nin% "Other"), mapping = aes(x = year, y = prop, ymin = prop - 2*prop_se, ymax = prop + 2*prop_se, fill = race, color = race, group = race))
p + geom_ribbon(alpha = 0.3, aes(color = NULL)) + geom_line() + facet_wrap(~degree, ncol = 5) + scale_y_continuous(labels = scales::percent) + scale_color_brewer(type = "qual", palette = "Dark2")+ scale_fill_brewer(type = "qual", palette = "Dark2") + labs(title = "Educational Attainment by Race", subtitle = "GSS 1976-2016", fill = "Race", color = "Race", x = NULL, y = "Percent") + theme(legend.position = "top")
out <- lm(formula = lifeExp ~ log(gdpPercap) + pop + continent, data = gapminder)
plot(out, which = c(1,2), ask = FALSE)
library(coefplot)
out <- lm( formula = lifeExp ~ log(gdpPercap) + log(pop) + continent, data = gapminder)
coefplot(out, sort = "magnitude", intercept = FALSE)
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
##
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
##
## nasa
organdata_sm <- organdata %>% select(donors, pop_dens, pubhealth, roads, consent_law)
ggpairs(data = organdata_sm, mapping = aes(color = consent_law), upper = list(continuous = wrap("density"), combo = "box_no_facet"),lower = list(continuous = wrap("points"), combo = wrap("dot_no_facet")))
# Hex color codes are for Dem Blue and Rep Red
party_colors <- c("#2E74C0", "#CB454A")
p0 <- ggplot(data = subset(election, st %nin% "DC"), mapping = aes(x = r_points, y = reorder(state, r_points), color = party))
p1 <- p0 + geom_vline(xintercept = 0, color = "gray30") + geom_point(size = 2)
p2 <- p1 + scale_color_manual(values = party_colors)
p3 <- p2 + scale_x_continuous(breaks = c(-30, -20, -10, 0, 10, 20, 30, 40), labels = c("30\n(Clinton)", "20", "10", "0", "10", "20", "30", "40\n(Trump"))
p3 + facet_wrap(~census, ncol = 1, scales = "free_y") + guides(color = FALSE) + labs(x = "Point Margin", y = "") + theme(axis.text = element_text(size = 8))
library(maps)
##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
library(ggthemes)
us_states <- map_data("state")
head(us_states)
## long lat group order region subregion
## 1 -87.46201 30.38968 1 1 alabama <NA>
## 2 -87.48493 30.37249 1 2 alabama <NA>
## 3 -87.52503 30.37249 1 3 alabama <NA>
## 4 -87.53076 30.33239 1 4 alabama <NA>
## 5 -87.57087 30.32665 1 5 alabama <NA>
## 6 -87.58806 30.32665 1 6 alabama <NA>
dim(us_states)
## [1] 15537 6
p <- ggplot(data = us_states, mapping = aes(x = long, y = lat, group = group, fill = region))
p + geom_polygon(color = "Gray90", size = 0.1) + coord_map(projection = "albers", lat0 = 39 , lat1 = 45) + guides(fill = FALSE)
election$region <- tolower(election$state)
us_states_elec <- left_join(us_states, election)
## Joining, by = "region"
p <- ggplot(data = us_states_elec, aes(x = long, y = lat, group = group, fill = party))
p + geom_polygon(color = "Gray90", size = 0.1) + coord_map(projection = "albers", lat0 = 39, lat1 = 45)
p0 <- ggplot(data=us_states_elec, mapping=aes(x=long,y=lat,group=group,fill=party))
p1 <- p0+geom_polygon(color="gray90", size=0.1)+coord_map(projection="albers",lat0=39,lat1=45)
p2 <- p1+scale_fill_manual(values=party_colors)+labs(title="Election Result 2016",fill=NULL)
p2 + theme_map()
p0 <- ggplot(data=us_states_elec, mapping=aes(x=long, y=lat, group=group, fill= pct_trump))
p1 <- p0 + geom_polygon(color="Gray90", size = 0.1) + coord_map(projection = "albers", lat0 = 39, lat1 = 45)
p1 + labs(title = "Trump vote") + theme_map() + labs(fill = "Percent")
p2 <- p1 + scale_fill_gradient(low = "white", high = "#CB454A") + labs(title = "Trump vote")
p2 + theme_map() + labs(fill = "Percent")
p0 <- ggplot(data = us_states_elec, mapping = aes(x = long, y = lat, group = group, fill = d_points))
p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) + coord_map(projection = "albers", lat0 = 39, lat1 = 45)
p2 <- p1 + scale_fill_gradient2() + labs(title = "Winning margins")
p2 + theme_map() + labs(fill = "Percent")
p3 <- p1 + scale_fill_gradient2(low = "red", mid = scales::muted("purple"), high = "blue", breaks = c(-25,0,25,50,75)) + labs(title = "Winning margins")
p3 + theme_map() + labs(fill = "Percent")
p0 <- ggplot(data = subset(us_states_elec, region %nin% "district of columbia"), mapping = aes(x = long, y = lat, group = group, fill = d_points))
p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) + coord_map(projection = "albers", lat0 = 39, lat1 = 45)
p2 <- p1 + scale_fill_gradient2(low = "red", mid = scales::muted("purple"), high = "blue") + labs(title = "Winning margins")
p2 + theme_map() + labs(fill = "Percent")
county_map %>% sample_n(5)
## long lat order hole piece group id
## 1 1423245.7 -836424.48 149769 FALSE 1 0500000US47001.1 47001
## 2 2195588.5 337747.38 168474 FALSE 1 0500000US50009.1 50009
## 3 415517.4 -1378725.57 164893 FALSE 1 0500000US48467.1 48467
## 4 852096.7 -1140827.79 96706 FALSE 1 0500000US28027.1 28027
## 5 2376294.3 83260.63 84898 FALSE 1 0500000US25023.1 25023
county_data %>% select(id, name, state, pop_dens, pct_black) %>% sample_n(5)
## id name state pop_dens pct_black
## 1 48473 Waller County TX [ 50, 100) [25.0,50.0)
## 2 19055 Delaware County IA [ 10, 50) [ 0.0, 2.0)
## 3 39095 Lucas County OH [ 1000, 5000) [15.0,25.0)
## 4 28029 Copiah County MS [ 10, 50) [50.0,85.3]
## 5 45000 41 SC [ 100, 500) [25.0,50.0)
county_full <- left_join(county_map, county_data, by = "id")
p0 <- ggplot(data = county_full, mapping = aes(x = long, y = lat, group = group, fill = pop_dens))
p1 <- p0 + geom_polygon(color = "gray90", size = 0.5) + coord_equal()
p2 <- p1 + scale_fill_brewer(palette = "Blues", labels = c("0-10", "10-50", "50-100", "100-500", "500-1,000", "1,000-5,000", ">5,000"))
p2 + labs(fill = "Population per\nsquare mile") + theme_map() + guides(fill = guide_legend(nrow = 1)) + theme(legend.position = "bottom")
p0 <- ggplot(data = county_full, mapping = aes(x = long, y = lat, group = group, fill = pct_black))
p1 <- p0 + geom_polygon(color = "gray90", size = 0.5) + coord_equal()
p2 <- p1 + scale_fill_brewer(palette = "Greens")
p2 + labs(fill = "US Population, Percent Black") + guides(fill = guide_legend(nrow = 1)) + theme_map() + theme(legend.position = "bottom")
orange_pal <- RColorBrewer::brewer.pal(n = 6, name = "Oranges")
orange_pal
## [1] "#FEEDDE" "#FDD0A2" "#FDAE6B" "#FD8D3C" "#E6550D" "#A63603"
orange_rev <- rev(orange_pal)
orange_rev
## [1] "#A63603" "#E6550D" "#FD8D3C" "#FDAE6B" "#FDD0A2" "#FEEDDE"
gun_p <- ggplot(data = county_full, mapping = aes(x = long, y = lat, group = group, fill = su_gun6))
gun_p1 <- gun_p + geom_polygon(color = "gray90", size = 0.5) + coord_equal()
gun_p2 <- gun_p1 + scale_fill_manual(values = orange_pal)
gun_p2 + labs(title = "Gun-Related Suicides, 1999-2015", fill = "Rate per 100,000 pop.") + theme_map() + theme(legend.position = "bottom")
pop_p <- ggplot(data = county_full, mapping = aes(x = long, y = lat, group = group, fill = pop_dens6))
pop_p1 <- pop_p + geom_polygon(color = "gray90", size = 0.5) + coord_equal()
pop_p2 <- pop_p1 + scale_fill_manual(values = orange_rev)
pop_p2 + labs(title = "Reverse-coded Population Density", fill = "People per square mile") + theme_map() + theme(legend.position = "bottom")
library(statebins)
statebins_continuous(state_data = election, state_col = "state", text_color = "white", value_col = "pct_trump", brewer_pal = "Reds", font_size = 3, legend_title = "Percent Trump")
## Warning: `show_guide` has been deprecated. Please use `show.legend` instead.
statebins_continuous(state_data = subset(election, st %nin% "DC"), state_col = "state", text_color = "black", value_col = "pct_clinton", brewer_pal = "Blues", font_size = 3, legend_title = "Percent Clinton")
## Warning: `show_guide` has been deprecated. Please use `show.legend` instead.
election <- election %>% mutate(color = recode(party, Republican = "darkred", Democrat = "royalblue"))
statebins_manual(state_data = election, state_col = "st", color_col = "color", text_color = "white", font_size = 3, legend_title = "Winner", labels = c("Trump", "Clinton"), legend_position = "right")
## Warning: `show_guide` has been deprecated. Please use `show.legend` instead.
statebins(state_data = election, state_col = "state", value_col = "pct_trump", text_color = "white", breaks = 4, labels = c("4-21", "21-37", "37-53", "53-70"), brewer_pal = "Reds", font_size = 3, legend_title = "Percent Trump")
## Warning: `show_guide` has been deprecated. Please use `show.legend` instead.
opiates$region <- tolower(opiates$state)
opiates_map <- left_join(us_states, opiates)
## Joining, by = "region"
library(viridis)
## Loading required package: viridisLite
p0 <- ggplot(data = subset(opiates_map, year > 1999), mapping = aes(x = long, y = lat, group = group, fill = adjusted))
p1 <- p0 + geom_polygon(color = "gray90", size = 0.05) + coord_map(projection = "albers", lat0 = 39, lat1 = 45)
p2 <- p1 + scale_fill_viridis_c(option = "plasma")
p2 + theme_map() + facet_wrap(~year,ncol = 3) + theme(legend.position = "bottom", strip.background = element_blank()) + labs(fill = "Death rate per 100,000 population ", title = "Opiate Related Deaths by State, 2000-2014")
p <- ggplot(data = opiates, mapping = aes(x = year, y = adjusted, group = state))
p + geom_line(color = "gray70")
## Warning: Removed 17 rows containing missing values (geom_path).
p0 <- ggplot(data = drop_na(opiates, division_name), mapping = aes(x = year, y = adjusted))
p1 <- p0 + geom_line(color = "gray70", mapping = aes(group = state))
p2 <- p1 + geom_smooth(mapping = aes(group = division_name), se = FALSE)
p3 <- p2 + geom_text_repel(data = subset(opiates, year == max(year) & abbr !="DC"), mapping = aes(x = year, y = adjusted, label = abbr),
size = 1.8, segment_color = NA, nudge_x = 30) + coord_cartesian(c(min(opiates$year), max(opiates$year)))
## Warning: Ignoring unknown parameters: segment_colour
p3 + labs(x = "", y = "Rate per 100,000 population", title = "State-Level Opiate Death Rates by Census Division, 1999-2014") + facet_wrap(~ reorder(division_name, -adjusted, na.rm = TRUE), ncol = 3)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 27 rows containing non-finite values (stat_smooth).
## Warning: Removed 17 rows containing missing values (geom_path).
head(asasec)
## Section Sname Beginning Revenues
## 1 Aging and the Life Course (018) Aging 12752 12104
## 2 Alcohol, Drugs and Tobacco (030) Alcohol/Drugs 11933 1144
## 3 Altruism and Social Solidarity (047) Altruism 1139 1862
## 4 Animals and Society (042) Animals 473 820
## 5 Asia/Asian America (024) Asia 9056 2116
## 6 Body and Embodiment (048) Body 3408 1618
## Expenses Ending Journal Year Members
## 1 12007 12849 No 2005 598
## 2 400 12677 No 2005 301
## 3 1875 1126 No 2005 NA
## 4 1116 177 No 2005 209
## 5 1710 9462 No 2005 365
## 6 1920 3106 No 2005 NA
p <- ggplot(data = subset(asasec, Year == 2014), mapping = aes(x = Members, y = Revenues, label = Sname))
p + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
p <- ggplot(data = subset(asasec, Year == 2014), mapping = aes(x = Members, y = Revenues, label = Sname))
p + geom_point(mapping = aes(color = Journal)) + geom_smooth(method = "lm")
p0 <- ggplot(data = subset(asasec, Year == 2014), mapping = aes(x = Members, y = Revenues, label = Sname))
p1 <- p0 + geom_point(mapping = aes(color = Journal)) + geom_smooth(method = "lm", se = FALSE, color = "gray80")
p2 <- p1 + geom_text_repel(data = subset(asasec, Year == 2014 & Revenues > 7000), size = 2)
p3 <- p2 + labs(x = "Membership", y = "Revenues", color = "Section has own Journal", title = "ASA Sections", subtitle = "2014 Calendar year.", caption = "Source: ASA annual report.")
p4 <- p3 + scale_y_continuous(labels = scales::dollar) + theme(legend.position = "top")
p4
p <- ggplot(data = organdata, mapping = aes(x = roads, y = donors, color = world))
p + geom_point(size = 2) + scale_color_brewer(palette = "Set2") + theme(legend.position = "top")
## Warning: Removed 46 rows containing missing values (geom_point).
p + geom_point(size = 2) + scale_color_brewer(palette = "Pastel2") + theme(legend.position = "top")
## Warning: Removed 46 rows containing missing values (geom_point).
p + geom_point(size = 2) + scale_color_brewer(palette = "Dark2") + theme(legend.position = "top")
## Warning: Removed 46 rows containing missing values (geom_point).
#Democratic Blue and Republican Red
party_colors <- c("#2E74C0", "#CB454A")
p0 <- ggplot(data = subset(county_data, flipped =="No"), mapping = aes(x = pop, y = black/100))
p1 <- p0 + geom_point(alpha = 0.15, color = "gray50") + scale_x_log10(labels = scales::comma)
p1
p2 <- p1 + geom_point(data = subset(county_data, flipped == "Yes"), mapping = aes(x=pop, y = black/100, color = partywinner16)) + scale_color_manual(values = party_colors)
p2
p3 <- p2 + scale_y_continuous(labels = scales::percent) + labs(color = "County flipped to ...", x = "County Population (log scale)", y = "Percent Black Population", title = "Flipped counties, 2016", caption = "Counties in gray did not flip.")
p3
p4 <- p3 + geom_text_repel(data = subset(county_data, flipped == "Yes" & black > 25), mapping = aes(x = pop, y = black/100, label = state), size = 2)
p4 + theme_minimal() + theme(legend.position = "top")
theme_set(theme_bw())
p4 + theme(legend.position = "top")
theme_set(theme_dark())
p4 + theme(legend.position = "top")
library(ggthemes)
theme_set(theme_economist())
p4 + theme(legend.position = "top")
theme_set(theme_wsj())
p4 + theme(plot.title = element_text(size = rel(0.6)),
legend.title = element_text(size = rel(0.35)),
plot.caption = element_text(size = rel(0.35)),
legend.position = "top")
yrs <- c(seq(1972,1988, 4), 1993, seq(1996,2016,4))
mean_age <- gss_lon %>% filter(age %nin% NA && year %in% yrs) %>% group_by(year) %>% summarize(xbar = round(mean(age, na.rm = TRUE), 0))
mean_age$y <- 0.3
library(cowplot)
##
## ********************************************************
## Note: As of version 1.0.0, cowplot does not change the
## default ggplot2 theme anymore. To recover the previous
## behavior, execute:
## theme_set(theme_cowplot())
## ********************************************************
##
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggthemes':
##
## theme_map
library(hrbrthemes)
## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
## Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
## if Arial Narrow is not on your system, please see http://bit.ly/arialnarrow
yr_labs <- data.frame(x = 85, y = 0.8, year = yrs)
p <- ggplot(data = subset(gss_lon, year %in% yrs), mapping = aes(x = age))
p1 <- p + geom_density(fill = "gray20", color = FALSE, alpha = 0.9, mapping = aes(y = ..scaled..)) +
geom_vline(data = subset(mean_age, year %in% yrs), aes(xintercept = xbar), color = "white", size = 0.5) +
geom_text(data = subset(mean_age, year %in% yrs), aes(x = xbar, y = y, label = xbar), nudge_x = 7.5, color = "white", size = 3.5, hjust = 1) +
geom_text(data = subset(yr_labs, year %in% yrs), aes(x = x, y = y, label = year)) + facet_grid(year ~ ., switch = "y")
theme_set(theme_base())
p1 + theme(legend.position = "top")
## Warning: Removed 83 rows containing non-finite values (stat_density).
library(ggridges)
##
## Attaching package: 'ggridges'
## The following object is masked from 'package:ggplot2':
##
## scale_discrete_manual
p <- ggplot(data = gss_lon, mapping = aes(x = age, y = factor(year, levels = rev(unique(year)), ordered = TRUE)))
p + geom_density_ridges(alpha = 0.6, fill = "lightblue", scale = 1.5) + scale_x_continuous(breaks = c(25,50,75)) +
scale_y_discrete(expand = c(0.01,0)) +
labs(x = "Age", y = NULL,
title = "Age Distribution of\nGSS Respondents") +
theme_ridges() +
theme(title = element_text(size = 16, face = "bold"))
## Picking joint bandwidth of 3.49
## Warning: Removed 221 rows containing non-finite values (stat_density_ridges).
head(fredts)
## date sp500 monbase sp500_i monbase_i
## 1 2009-03-11 696.68 1542228 100.0000 100.0000
## 2 2009-03-18 766.73 1693133 110.0548 109.7849
## 3 2009-03-25 799.10 1693133 114.7012 109.7849
## 4 2009-04-01 809.06 1733017 116.1308 112.3710
## 5 2009-04-08 830.61 1733017 119.2240 112.3710
## 6 2009-04-15 852.21 1789878 122.3245 116.0579
fredts_m <- fredts %>% select(date, sp500_i, monbase_i) %>% gather(key = series, value = score, sp500_i:monbase_i)
head(fredts_m)
## date series score
## 1 2009-03-11 sp500_i 100.0000
## 2 2009-03-18 sp500_i 110.0548
## 3 2009-03-25 sp500_i 114.7012
## 4 2009-04-01 sp500_i 116.1308
## 5 2009-04-08 sp500_i 119.2240
## 6 2009-04-15 sp500_i 122.3245
p <- ggplot(data = fredts_m, mapping = aes(x = date, y = score, group = series, color = series))
p1 <- p + geom_line() + theme(legend.position = "top") +
labs(x = "Date",
y = "Index",
color = "Series")
p <- ggplot(data = fredts, mapping = aes(x=date, y = sp500_i - monbase_i))
p2 <- p + geom_line() + labs(x = "Date",
x = "Difference")
cowplot::plot_grid(p1, p2, nrow = 2, rel_heights = c(0.75, 0.25), align = "v")
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
head(yahoo)
## Year Revenue Employees Mayer
## 1 2004 3574 7600 No
## 2 2005 5257 9800 No
## 3 2006 6425 11400 No
## 4 2007 6969 14300 No
## 5 2008 7208 13600 No
## 6 2009 6460 13900 No
p <- ggplot(data = yahoo, mapping = aes(x = Employees, y = Revenue))
p + geom_path(color = "gray80") +
geom_text(aes(color = Mayer, label = Year),
size = 3, fontface = "bold") +
theme(legend.position = "bottom") +
labs(color = "Mayer is CEO", x = "Employees", y ="Revenue (Millions)",
title = "Yahoo Employees vs Revenues, 2004-2014") +
scale_y_continuous(labels = scales::dollar) +
scale_x_continuous(labels = scales::comma)
p <- ggplot(data = yahoo, mapping = aes(x = Year, y = Revenue/Employees))
p + geom_line(color = "gray60", size = 2) +
geom_vline(xintercept = 2012) +
annotate("text", x = 2013, y = 0.44, label = " Mayer becomes CEO", size = 2.5) +
labs(x = "Year\n", y = "Revenue/Employees",
title = "Yahoo Revenue to Employee Ratio, 2004-2014")
head(studebt)
## # A tibble: 6 x 4
## Debt type pct Debtrc
## <ord> <fct> <int> <ord>
## 1 Under $5 Borrowers 20 Under $5
## 2 $5-$10 Borrowers 17 $5-$10
## 3 $10-$25 Borrowers 28 $10-$25
## 4 $25-$50 Borrowers 19 $25-$50
## 5 $50-$75 Borrowers 8 $50-$75
## 6 $75-$100 Borrowers 3 $75-$100
p_xlab <- "Amount Owed, in thousands of Dollars"
p_title <- "Outstanding Student Loans"
p_subtitle <- "44 million borrowers owe a total of $1.3 trillion"
p_caption <- "Source: FRB NY"
f_labs <- c(`Borrowers` = "Percent of \nall Borrowers",
`Balances` = "Percent of \nall Balances")
p <- ggplot(data = studebt, mapping = aes(x = Debt, y = pct/100, fill = type))
p + geom_bar(stat = "identity") + scale_fill_brewer(type = "qual", palette = "Dark2") +
scale_y_continuous(labels = scales::percent) +
guides(fill = FALSE) +
theme(strip.text.x = element_text(face = "bold")) +
labs(y = NULL, x = p_xlab,
caption = p_caption,
title = p_title,
subtitle = p_subtitle) +
facet_grid(~type, labeller = as_labeller(f_labs)) +
coord_flip()
library(viridis)
p <- ggplot(studebt, aes(y = pct/100, x = type, fill = Debtrc))
p + geom_bar(stat = "identity", color = "gray80") +
scale_x_discrete(labels = as_labeller(f_labs)) +
scale_y_continuous(labels = scales::percent) +
scale_fill_viridis(discrete = TRUE) +
guides(fill = guide_legend(reverse = TRUE,
title.position = "top",
label.position = "bottom",
keywidth = 3,
nrow = 1)) +
labs(x = NULL, y = NULL,
fill = "Amount Owed, in thousands of dollars",
caption = p_caption,
title = p_title,
subtitle = p_subtitle) +
theme(legend.position = "top",
axis.text.y = element_text(face = "bold", hjust = 1, size = 12),
axis.ticks.length = unit(0, "cm"),
panel.grid.major.y = element_blank()) +
coord_flip()